Using Surface-Learning to improve Speech Recognition with Lipreading

نویسندگان

Christoph Bregler

Stephen Omohundro

Yochai Konig

Nelson Morgan

چکیده

We explore multimodal recognition by combining visual lipreading with acoustic speech recognition. We show that combining the visual and acoustic clues of speech improves the recog nition performance significantly especially in noisy environment. We achieve this with a hybrid speech recognition architecture, consisting of a new visual learning and tracking mechanism, a channel robust acoustic front end, a connectionist phone classifier, and a HMM based sentence classifier. We focus in this paper on the visual subsystem based on "surface-learning" and ac tive vision models. Our bimodal hybrid speech recognition system has already been applied to a multi-speaker spelling task, and work is in progress to apply it to a speaker independent spontaneous speech task, the "Berkeley Restaurant Project (BeRP)".

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Surface Learning with Applications to Lipreading Surface Learning with Applications to Lipreading

Most connectionist research has focused on learning mappings from one space to another (eg. classiication and regression). This paper introduces the more general task of learning constraint surfaces. It describes a simple but powerful architecture for learning and manipulating nonlinear surfaces from data. We demonstrate the technique on low dimensional synthetic surfaces and compare it to near...

متن کامل

Surface Learning with Applications to Lipreading

Most connectionist research has focused on learning mappings from one space to another (eg. classification and regression). This paper introduces the more general task of learning constraint surfaces. It describes a simple but powerful architecture for learning and manipulating nonlinear surfaces from data. We demonstrate the technique on low dimensional synthetic surfaces and compare it to nea...

متن کامل

Learning Visual Models for Lip Reading

This chapter describes learning techniques that are the basis of a "visual speech recognition" or "lipreading" system 1 • Model-based vision systems currently have the best performance for many visual recognition tasks. For geometrically simple domains, models can sometimes be constructed by hand using CAD-like tools. Such models are difficult and expensive to construct, however, and are inadeq...

متن کامل

Visual gesture variability between talkers in continuous visual speech

Recent adoption of deep learning methods to the field of machine lipreading research gives us two options to pursue to improve system performance. Either, we develop endto-end systems holistically or, we experiment to further our understanding of the visual speech signal. The latter option is more difficult but this knowledge would enable researchers to both improve systems and apply the new kn...

متن کامل

LCANet: End-to-End Lipreading with Cascaded Attention-CTC

Machine lipreading is a special type of automatic speech recognition (ASR) which transcribes human speech by visually interpreting the movement of related face regions including lips, face, and tongue. Recently, deep neural network based lipreading methods show great potential and have exceeded the accuracy of experienced human lipreaders in some benchmark datasets. However, lipreading is still...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Using Surface-Learning to improve Speech Recognition with Lipreading

نویسندگان

چکیده

منابع مشابه

Surface Learning with Applications to Lipreading Surface Learning with Applications to Lipreading

Surface Learning with Applications to Lipreading

Learning Visual Models for Lip Reading

Visual gesture variability between talkers in continuous visual speech

LCANet: End-to-End Lipreading with Cascaded Attention-CTC

عنوان ژورنال:

اشتراک گذاری